Feature selection and classification of urinary mRNA microarray data by iterative random forest to diagnose renal fibrosis: a two-stage study

نویسندگان

  • Le-Ting Zhou
  • Yu-Han Cao
  • Lin-Li Lv
  • Kun-Ling Ma
  • Ping-Sheng Chen
  • Hai-Feng Ni
  • Xiang-Dong Lei
  • Bi-Cheng Liu
چکیده

Renal fibrosis is a common pathological pathway of progressive chronic kidney disease (CKD). However, kidney function parameters are suboptimal for detecting early fibrosis, and therefore, novel biomarkers are urgently needed. We designed a 2-stage study and constructed a targeted microarray to detect urinary mRNAs of CKD patients with renal biopsy and healthy participants. We analysed the microarray data by an iterative random forest method to select candidate biomarkers and produce a more accurate classifier of renal fibrosis. Seventy-six and 49 participants were enrolled into stage I and stage II studies, respectively. By the iterative random forest method, we identified a four-mRNA signature in urinary sediment, including TGFβ1, MMP9, TIMP2, and vimentin, as important features of tubulointerstitial fibrosis (TIF). All four mRNAs significantly correlated with TIF scores and discriminated TIF with high sensitivity, which was further validated in the stage-II study. The combined classifiers showed excellent sensitivity and outperformed serum creatinine and estimated glomerular filtration rate measurements in diagnosing TIF. Another four mRNAs significantly correlated with glomerulosclerosis. These findings showed that urinary mRNAs can serve as sensitive biomarkers of renal fibrosis, and the random forest classifier containing urinary mRNAs showed favourable performance in diagnosing early renal fibrosis.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Feature Selection and Classification of Microarray Gene Expression Data of Ovarian Carcinoma Patients using Weighted Voting Support Vector Machine

We can reach by DNA microarray gene expression to such wealth of information with thousands of variables (genes). Analysis of this information can show genetic reasons of disease and tumor differences. In this study we try to reduce high-dimensional data by statistical method to select valuable genes with high impact as biomarkers and then classify ovarian tumor based on gene expression data of...

متن کامل

Classification and Biomarker Genes Selection for Cancer Gene Expression Data Using Random Forest

Background & objective: Microarray and next generation sequencing (NGS) data are the important sources to find helpful molecular patterns. Also, the great number of gene expression data increases the challenge of how to identify the biomarkers associated with cancer. The random forest (RF) is used to effectively analyze the problems of large-p and smal...

متن کامل

Gene Identification from Microarray Data for Diagnosis of Acute Myeloid and Lymphoblastic Leukemia Using a Sparse Gene Selection Method

Background: Microarray experiments can simultaneously determine the expression of thousands of genes. Identification of potential genes from microarray data for diagnosis of cancer is important. This study aimed to identify genes for the diagnosis of acute myeloid and lymphoblastic leukemia using a sparse feature selection method. Materials and Methods: In this descriptive study, the expressio...

متن کامل

A Random Forest Classifier based on Genetic Algorithm for Cardiovascular Diseases Diagnosis (RESEARCH NOTE)

Machine learning-based classification techniques provide support for the decision making process in the field of healthcare, especially in disease diagnosis, prognosis and screening. Healthcare datasets are voluminous in nature and their high dimensionality problem comprises in terms of slower learning rate and higher computational cost. Feature selection is expected to deal with the high dimen...

متن کامل

Diagnosis of the disease using an ant colony gene selection method based on information gain ratio using fuzzy rough sets

With the advancement of metagenome data mining science has become focused on microarrays. Microarrays are datasets with a large number of genes that are usually irrelevant to the output class; hence, the process of gene selection or feature selection is essential. So, it follows that you can remove redundant genes and increase the speed and accuracy of classification. After applying the gene se...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره 7  شماره 

صفحات  -

تاریخ انتشار 2017